XML Lossy Text Compression: A Preliminary Study

نویسندگان

  • Angela Bonifati
  • Marianna Lorusso
  • Domenica Sileo
چکیده

Lossy compression techniques have been applied to image and text compression, yielding compression factors that are vastly superior to lossless compression schemes. In this paper, we present a preliminary study on a set of lossy transformations for XML documents that preserve the semantics. Inspired by previous techniques, e.g. lossy text compression and literate programming, we apply a simple algorithm to XML syntactic constructs to loose superfluous layout information and redundant text. The obtained XML keeps the human-readability and machine-readability properties. Additionally, it can lead to a considerable reduction of its space occupancy and boost the application of conventional text compressors, thus representing a promising technology for several data management tasks.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Toward Remote Object Coherence with Compiled Object Serialization for Distributed Computing with XML Web Services

Cross-platform object-level coherence in Web services-based distributed systems and grids requires lossless serialization to ensure programming-language specific objects are safely transmitted, manipulated, and stored. However, Web services development tools often suffer from lossy forms of XML serialization, which diminishes the usefulness of XML Web services as a competitive approach to binar...

متن کامل

Semantic Lossy Compression of XML Data

In the last years a large amount of semistructured data [1, 10] has been managed and exchanged. The largest repository of semistructured data is the World Wide Web, which can be thought of as an enormous database in which data is highly heterogeneous and freely correlated. In this scenario is placed Extensible Markup Language (XML) [14], a language for semistructured data standardised by the Wo...

متن کامل

Compact XML grammar based compression

Extensible Markup Language (XML) is the standard format for content representation and sharing on the Web. XML is a highly verbose language, especially regarding the duplication of meta-data in the form of elements and attributes. As XML content is becoming more widespread so is the demand to compress XML data volume. This paper presents a new grammar, called D-grammar, which defines XML struct...

متن کامل

XML Structure Compression

XML is becoming the universal language for communicating information on the Web and has gained wide acceptance through its standardisation. As such XML plays an important enabling role for dynamic computation over the Web. Compression of XML documents is crucial in this process as, in its raw form, it often contains a sizable amount of redundancy. Several XML compression algorithms have been pr...

متن کامل

Squeezex: Synthesis and Compression of XML Data

XML is emerging as the “universal” language for semistructured data description/exchange, and new issues regarding the management of XML data, both in terms of performance and usability, are becoming critical. The application of knowledge-based synthesization and compression methods (i.e. derivation of synthetic views and lossless/lossy approximation of contents) can be extremely beneficial in ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009